--- title: fastproaudio core keywords: fastai sidebar: home_sidebar summary: "API details." description: "API details." nb_path: "00_core.ipynb" ---
{% raw %}
{% endraw %} {% raw %}
{% endraw %}

Some audio data urls:

URLs.AUDIOMDPI = 'https://zenodo.org/record/3562442'

URLs.MARCO = URLs.AUDIOMDPI # just a shorthand alias I'm more likely to remember

URLs.SIGNALTRAIN_LA2A_1_1 = 'https://zenodo.org/record/3824876'

URLs.SIGNALTRAIN_LA2A_REDUCED = 'http://hedges.belmont.edu/data/SignalTrain_LA2A_Reduced.tgz'

{% raw %}

zenodo_url_to_data_url[source]

zenodo_url_to_data_url(url)

Parameters:

  • url : <class 'inspect._empty'>
{% endraw %} {% raw %}
{% endraw %} {% raw %}
print(URLs.MARCO)
print(zenodo_url_to_data_url(URLs.MARCO))
https://zenodo.org/record/3562442
https://zenodo.org/api/files/d6589bb4-d6a6-4bc6-8e51-e6334fafbe3f/AudioMDPI.zip
{% endraw %} {% raw %}
print(URLs.SIGNALTRAIN_LA2A_1_1)
print(zenodo_url_to_data_url(URLs.SIGNALTRAIN_LA2A_1_1))
https://zenodo.org/record/3824876
https://zenodo.org/api/files/df302f12-7355-452e-93d1-b0c9344608f7/SignalTrain_LA2A_Dataset_1.1.tgz
{% endraw %} {% raw %}

get_audio_data[source]

get_audio_data(url)

Parameters:

  • url : <class 'inspect._empty'>
{% endraw %} {% raw %}
{% endraw %}

Try downloading a sample .tgz file

{% raw %}
path_st = get_audio_data(URLs.SIGNALTRAIN_LA2A_REDUCED)
path_st
Path('/home/shawley/.fastai/data/SignalTrain_LA2A_Reduced')
{% endraw %}

And try downloading from a Zenodo URL:

{% raw %}
path_audiomdpi = get_audio_data(URLs.MARCO)
path_audiomdpi
Path('/home/shawley/.fastai/data/AudioMDPI')
{% endraw %} {% raw %}
 
{% endraw %}

Let's use this data as an example and take a look at it:

{% raw %}
path_audiomdpi.ls()
(#4) [Path('/home/shawley/.fastai/data/AudioMDPI/LeslieWoofer'),Path('/home/shawley/.fastai/data/AudioMDPI/LeslieHorn'),Path('/home/shawley/.fastai/data/AudioMDPI/license.txt'),Path('/home/shawley/.fastai/data/AudioMDPI/6176ChannelStrip')]
{% endraw %}

We'll grab the LeslieHorn subset

{% raw %}
horn = path_audiomdpi / "LeslieHorn"; horn.ls()
(#4) [Path('/home/shawley/.fastai/data/AudioMDPI/LeslieHorn/readme.txt'),Path('/home/shawley/.fastai/data/AudioMDPI/LeslieHorn/chorale'),Path('/home/shawley/.fastai/data/AudioMDPI/LeslieHorn/tremolo'),Path('/home/shawley/.fastai/data/AudioMDPI/LeslieHorn/dry')]
{% endraw %} {% raw %}
path_dry = horn /'dry'
#path_trem = horn / 'tremolo'
audio_extensions = ['.m3u', '.ram', '.au', '.snd', '.mp3','.wav']
fnames_dry = get_files(path_dry, extensions=audio_extensions)
{% endraw %} {% raw %}
waveform, sample_rate = torchaudio.load(fnames_dry[0])
{% endraw %}

Let's take a look at it:

{% raw %}

show_info[source]

show_info(waveform, sample_rate)

Parameters:

  • waveform : <class 'inspect._empty'>

  • sample_rate : <class 'inspect._empty'>

{% endraw %} {% raw %}

plot_waveform[source]

plot_waveform(waveform, sample_rate, ax=None, xlim=None, ylim=[-1, 1], color='blue', label='', title='Waveform')

Waveform plot, from https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html

Parameters:

  • waveform : <class 'inspect._empty'>

    the tensor to plot.

  • sample_rate : <class 'inspect._empty'>

    used for labeling x-axis in terms of time

  • ax : <class 'NoneType'>, optional

    can be an existing array of plot axes, or None

  • xlim : <class 'NoneType'>, optional

    limits of x-axis

  • ylim : <class 'list'>, optional

    limits of y-axis

  • color : <class 'str'>, optional

    can specify color for waveform plot

  • label : <class 'str'>, optional

    label for waveform plot

  • title : <class 'str'>, optional
{% endraw %} {% raw %}

plot_melspec[source]

plot_melspec(waveform, sample_rate, ax=None, ref=amax, vmin=-70, vmax=0)

Mel-spectrogram plot, from librosa documentation

Parameters:

  • waveform : <class 'inspect._empty'>

  • sample_rate : <class 'inspect._empty'>

  • ax : <class 'NoneType'>, optional

  • ref : <class 'function'>, optional

  • vmin : <class 'int'>, optional

  • vmax : <class 'int'>, optional

{% endraw %} {% raw %}

play_audio[source]

play_audio(waveform, sample_rate)

From torchaudio preprocessing tutorial. note ipython docs claim Audio can already do multichannel: "# Can also do stereo or more channels"

Parameters:

  • waveform : <class 'inspect._empty'>

  • sample_rate : <class 'inspect._empty'>

{% endraw %} {% raw %}

show_audio[source]

show_audio(waveform, sample_rate, info=True, play=True, plots=['waveform', 'melspec'], ref=500)

This display routine is an amalgam of the torchaudio tutorial and the librosa documentation:

Parameters:

  • waveform : <class 'inspect._empty'>

  • sample_rate : <class 'inspect._empty'>

  • info : <class 'bool'>, optional

  • play : <class 'bool'>, optional

  • plots : <class 'list'>, optional

  • ref : <class 'int'>, optional

{% endraw %} {% raw %}
{% endraw %} {% raw %}
show_audio(waveform, sample_rate)
Shape: (1, 110250), Dtype: torch.float32, Duration: 2.5 s
Max:  1.000,  Min: -0.973, Mean: -0.000, Std Dev:  0.086
{% endraw %} {% raw %}
show_audio(waveform, sample_rate, info=False, play=False, plots=['melspec'], ref=1)
{% endraw %}

Let's make a multi-channel tensor and plot it:

{% raw %}
num_channels = 5
n = waveform.shape[-1]*3
waveform2 = torch.zeros((num_channels,n))
for c in range(num_channels):
    start = int(np.random.rand()*waveform.shape[-1]*(2))
    this_waveform, _ = torchaudio.load(fnames_dry[c])
    waveform2[c, start:start+waveform.shape[-1]] = this_waveform
{% endraw %} {% raw %}
show_audio(waveform2, sample_rate)
Shape: (5, 330750), Dtype: torch.float32, Duration: 7.5 s
Max:  1.000,  Min: -1.000, Mean: -0.000, Std Dev:  0.037
{% endraw %}

Here's some code illustrating how we're making our time alignment dataset:

{% raw %}
sample = waveform[0].numpy()  # just simplify array dimensions for this demo
sample = sample[int(0.6*sample_rate):]  # chop off the silence at the front for this demo

track_length = sample_rate*5
sample_len = sample.shape[-1]
target = np.zeros(track_length)
input = np.zeros(track_length)
click = np.zeros(track_length)

grid_interval = sample_rate

n_intervals = track_length // grid_interval
for i in range(n_intervals):
    start = grid_interval*i 
    click[start] = 1                          # click track
    end = min( start+sample_len, track_length)
    target[start:end] = sample[0:end-start]  # paste the sample on the grid
    
    # mess up the paste location
    rand_start = max(0, start + np.random.randint(-grid_interval//2,grid_interval//2))
    rand_end = min( rand_start+sample_len, track_length )
    input[rand_start:rand_end] = sample[0:rand_end-rand_start]

fig = plt.figure(figsize=(14, 8))
plt.plot(target)
[<matplotlib.lines.Line2D at 0x7f737463e2b0>]
{% endraw %} {% raw %}
fig = plt.figure(figsize=(14, 8))
plt.plot(input)
[<matplotlib.lines.Line2D at 0x7f732e62cee0>]
{% endraw %} {% raw %}
fig = plt.figure(figsize=(14, 8))
plt.plot(click)
[<matplotlib.lines.Line2D at 0x7f732e62c850>]
{% endraw %}